Skip to content

PKG: Exclude data test files. #19535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jun 26, 2018
Merged

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Feb 4, 2018

$ ls -lh dist
total 86552
-rw-r--r--  1 taugspurger  staff    10M Feb  4 14:12 pandas-0.23.0.dev0+218.g3f3b4e0bc-cp36-cp36m-macosx_10_12_x86_64.whl
-rw-r--r--  1 taugspurger  staff    12M Feb  4 14:12 pandas-0.23.0.dev0+218.g3f3b4e0bc.tar.gz
-rw-r--r--  1 taugspurger  staff   7.8M Feb  4 12:18 pandas-0.23.0.dev0+219.g4d77cd8e6-cp36-cp36m-macosx_10_12_x86_64.whl
-rw-r--r--  1 taugspurger  staff   9.5M Feb  4 14:11 pandas-0.23.0.dev0+219.g4d77cd8e6.tar.gz

Source: 12M -> 9.5M
Binary: 10M -> 7.8M

Need to do a bit more testing to make sure I didn't break anything.

And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present.

Closes #19320
Closes #21436

@TomAugspurger TomAugspurger added Testing pandas testing functions or related to the test suite Build Library building on various platforms labels Feb 4, 2018
@@ -65,9 +65,6 @@ def _skip_if_none_of(module_names):
pytest.skip("Bad version of bs4: 4.2.0")


DATA_PATH = tm.get_data_path()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This types of changes were because pytest.skip can't be called outside of test methods.

setup.py Outdated
@@ -722,11 +722,7 @@ def pxd(name):
maintainer=AUTHOR,
version=versioneer.get_version(),
packages=find_packages(include=['pandas', 'pandas.*']),
package_data={'': ['data/*', 'templates/*'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone know what the data/* files were referencing? I think it was supposed to be pandas/tests/data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this looks for a data and template sub directory in any of the packages. It doesn’t refer to just one directory.

FWIW I think if those sub directories had an init they would automatically be included per the line above this, but given that’s not the case this helps include those folders relative to any of the packages that are found

@jorisvandenbossche
Copy link
Member

And need to think about how to test this going forward to ensure we don't write tests that aren't skipped if a data file isn't present.

And likewise we should also make sure to by accident not write tests that are skipped due to an error in the path? (maybe a grep for "Data files not included in pandas distribution." in the travis log could help for that?)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Feb 4, 2018 via email



@pytest.fixture(scope='module')
def jsonl_file():
"""Path a JSONL dataset"""
return os.path.join(HERE, 'parser', 'data', 'items.jsonl')
path = os.path.join(HERE, 'parser', 'data', 'items.jsonl')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should just make this a function (or maybe a decorator)

@codecov
Copy link

codecov bot commented Feb 25, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@95427d5). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #19535   +/-   ##
=========================================
  Coverage          ?    91.9%           
=========================================
  Files             ?      153           
  Lines             ?    49544           
  Branches          ?        0           
=========================================
  Hits              ?    45532           
  Misses            ?     4012           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.3% <ø> (?)
#single 41.77% <ø> (?)
Impacted Files Coverage Δ
pandas/util/testing.py 84.98% <ø> (ø)
pandas/util/_test_decorators.py 92.5% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 95427d5...dbe0c57. Read the comment docs.

@pep8speaks
Copy link

pep8speaks commented Feb 25, 2018

Hello @TomAugspurger! Thanks for updating the PR.

  • In the file test_foo.py, following are the PEP8 issues :

Line 3:15: W291 trailing whitespace
Line 9:20: W291 trailing whitespace

Comment last updated on June 21, 2018 at 14:14 Hours UTC

@@ -25,12 +25,12 @@ if [ "$DOC" ]; then
echo "We are not running pytest as this is a doc-build"

elif [ "$COVERAGE" ]; then
echo pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
pytest -s -m "single" --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
echo pytest -s -m "single" -r xXs --strict --cov=pandas --cov-report xml:/tmp/cov-single.xml --junitxml=/tmp/single.xml $TEST_ARGS pandas
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe should make a variable that holds all of these options for both the echo and the run to avoid duplication

if request.config.getoption("--strict-data-files"):
raise ValueError("Failed.")
else:
pytest.skip("Data files not included in pandas distribution.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add the path name in the message

@@ -170,6 +170,8 @@ def test_read_non_existant(self, reader, module, error_class, fn_ext):
])
def test_read_fspath_all(self, reader, module, path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can generally just autouse the datapath fixture

@TomAugspurger TomAugspurger changed the title [WIP]PKG: Exclude data test files. PKG: Exclude data test files. Mar 27, 2018
Build Changes
-------------

- The source and binary distributions no longer include test files, resulting in smaller download sizes. Tests relying on these files will be skipped when using ``pandas.test()``. (:issue:`19320`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test files -> "test data files" or "data files for testing"

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 20, 2018 via email

@jorisvandenbossche
Copy link
Member

That slightly changes the meaning :)

But in the correct sense?

From reading the current whatsnew, I understood that you removed all test .py files, so could not run any test (which of course contradicts with the rest of the sentence, but still ..)

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 20, 2018 via email

@@ -59,7 +60,8 @@ def setup_method(self, method):
self.mixed_frame = _mixed_frame.copy()
self.categorical = _cat_frame.copy()

def teardown_method(self, method):
yield
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah we should really not use this pattern, rather changing to all fixtures. as a temporary workaround this ok, can you create an issue to 'fix' this properly though.

@@ -25,8 +25,7 @@
import pandas.util._test_decorators as td
from pandas.util.testing import makeCustomDataframe as mkdf, network


DATA_PATH = tm.get_data_path()
HERE = os.path.dirname(__file__)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we still need this?

# GH12142 0.17 files packed in P2 can't be read in P3
if (compat.PY3 and version.startswith('0.17.') and
legacy_packer.split('.')[-4][-1] == '2'):
pytest.skip("Files packed in Py2 can't be read in Py3.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the filename here

@jreback jreback added this to the 0.23.2 milestone Jun 21, 2018
@@ -37,6 +37,11 @@ Documentation Changes
-
-

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a ref here as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General note: IMO it is not needed to always ask this of contributors, as this ref is only needed when we actually want to make an explicit link to it from within the rst files (and chances are quite high we will never do this). The ref can always be added at the moment one adds a link.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, but in general its a good practice


DATA_PATH = tm.get_data_path()

def pytest_generate_tests(metafunc):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do people thing about this approach?

This seems to be the recommended why to dynamically generate parametrized fixtures, based on some condition at runtime (the files present in this folder).

Copy link
Contributor Author

@TomAugspurger TomAugspurger Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, in this case I'm not sure it makes sense...

When the files aren't present, paths will be empty.

Copy link
Contributor Author

@TomAugspurger TomAugspurger Jun 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this probably is what we want. When the files are present, we'll get one fixture per file, which is exactly what we want. When they aren't present, nothing is collected (no fixtures), which is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But will it then still fail in the strict case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The files won't be there, so no fixtures are generated. It won't "exist", so there's nothing to fail :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's impossible for dynamically generated fixtures like this (which may be your point).

The alternative is to explicitly list them, which isn't so difficult, as there are only 8. Then we'll have a fixture like html_files. going forward, if we have to add an additional HTML file, we would add it there. I'd prefer being explicit in this case, since there are so few files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the advantage of explicitly listing them is also that we don't need to use this dark pytest magic :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the normal pytest magic :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late here but I don’t mind this approach (not that different from metaclasses in Python). Is it not theoretically possible though to just fail the generated tests if the strict parameter is supplied yet no files are found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I'd prefer to explicitly list them since there are so few.

os.path.join(DATA_PATH, 'html_encoding', '*.html')))
def test_encode(self, f):
_, encoding = os.path.splitext(os.path.basename(f))[0].split('_')
def test_encode(self, html_file):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where html_file is used.

test_foo.py Outdated
@@ -0,0 +1,22 @@
import pytest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😳

@jreback
Copy link
Contributor

jreback commented Jun 26, 2018

lgtm otherwise. merge when ready.

@TomAugspurger TomAugspurger merged commit 36422a8 into pandas-dev:master Jun 26, 2018
@TomAugspurger
Copy link
Contributor Author

Fixed a merge conflict.

@TomAugspurger TomAugspurger deleted the package-size branch June 26, 2018 15:02
jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jul 2, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jul 5, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants